Let’s Git Started

Prof. Matthew G. Son

University of South Florida

Installation

Git installation: Windows

  1. For using Git/Github and Unix, install git bash for windows from:

https://git-scm.com/downloads

  1. Configure git, your name and email address
git config --global user.name "Your Name"
git config --global user.email "you@example.com"

git config --global --list

  1. In VScode (Positron)
  • Open command pallete with Ctrl + Shift + P
  • Type “Terminal:Select Default Profile”
  • Choose Git Bash for your default terminal

Git for macOS

  1. Check if Xcode command line tools is not installed (if installed, skip)
  • Open terminal
xcode-select --install
  1. Configure git, your name and email address
git config --global user.name "Your Name"
git config --global user.email "you@example.com"

# Confirm your setting
git config --global --list

Note

Important

  • You should also create an account on GitHub. Optionally, register with USF for a student previleges.

  • Check if terminal understands git command. Type git on terminal and confirm no error message.

Github Login

Make sure you are able to login to Github.

Introduction to Git/GitHub

Why do we need Git?

Git

  • Distributed version control system, created by Linus Torvalds (2005)

  • “Track changes” made by team members, merge into main, etc.

  • Considered a “must” for software development, and many data science projects.

  • Has a learning curve, but it’s worth learning even for solo projects.

GitHub

  • An online hosting platform that is based on Git system

  • Easy to browse other repositories (public, free)

  • Others: GitLab, Bitbucket, GitBucket, etc…

Why Git?

  • Version control: track changes, revert to previous versions

  • Collaboration: multiple people can work on the same project

  • Backup: store your project on the cloud

  • Portfolio: showcase your work to potential employers

  • Open source: contribute to other projects

Is Git same as GitHub?

Nope.

  • Git can be done completely locally (w/o internet).

  • GitHub deploys Git to online cloud system.

Does Git/Github work with any files?

Git is primarily designed to handle text files and tracks changes line by line.

  • It can upload binary files (e.g. images, pdfs), but doesn’t track differences like text files.

  • GitHub has a small file size limit (100MB)

  • It is intended for code tracking. Large data should be stored elsewhere.

Git Terminology/workflow

Git shell commands

Git init

git init is used to initialize a new git repository from current working directory.

  • This command creates a new hidden subdirectory named .git
  • If you want to stop git, just remove this folder.
git init

Git status

git status shows the status of changes as untracked, modified, or staged.

git status

Understanding Git Status

git status provides information about the state of your working directory and staging area. Here are the key terms:

  • Untracked files: Files that Git isn’t tracking yet.

  • Tracked files: Files that Git has been monitoring for changes.

    • Staged: Files added to the staging area (using git add).
    • Unstaged: Modifications made to tracked files that haven’t been staged yet.
      • Includes “Modified(M)” and “Deleted(D)”

Git log

git log shows the history of commits and its corresponding hashes.

  • :q to quit!
git log
git log --oneline # brief view

Stage files

git add stages files for commit.

git add <filename> # stage specific files
git add . # stage all files (tracked / untracked)
git add -A # stage all (including parent and subfolders)
git add -u # stage only tracked files (modified, deleted)

Make commit

git commit creates your project’s version, or a hash block.

You should provide message folling -m, that describes the version.

  • Consider committing a serious process of your job, like signing a doc.
git commit -m "Commit message, describes the work / version"

Exercise

1) My First Commit in Git

  1. Make class folder tracked by git (FIN4770)
  • Go to your folder from terminal.

  • In the Shell folder, create a text file For_first_commit.sh.

  • In the file, add text “Learing Git is Fun!”

  • Get out of Shell folder, and back to your class folder.

  • Now initiate git in your class folder.

  1. After initializing, git now “scans” your folder.
  • Check the status of git, with git status. What do you see?
  1. Let’s add all of the folders and files be tracked by git.
  • Use git add . to stage all untracked files and folders.
  • From now on, all those files are tracked.
  1. Now let’s make a commit.

    • By committing, you are making a version of the project.
    • You can come back to this state of your project in the future.
    • Make informative message in the commit!

For commit message, use “My first commit”.

  1. Check the status of git once again.

Git Reset

When you want to get back to previous commit (version).

  • HEAD is a pointer where you are currently at.
  • HEAD~1 means 1 commit before current hash (version).

  1. git reset B

  • Changes made after B are “unstaged”
  • Go to the status where “Before I staged files and folders”
  • Same as git reset --mixed B
  1. git reset --soft B

  • Changes after B are “staged”
  • Go to the status where I was “Ready to commit again”
  1. git reset --hard B

  • Match completely to commit B, all others discarded.
  • Go to the status when I just finished committing B.
  • Be careful!! all changes after that are gone.

Exercise

Second commit & Reset

  1. Let’s make a faulty second commit then revert:
  • Go to your folder from terminal.

  • In the Shell folder, edit For_first_commit.sh file:

    • Append line: “Learning Git is Not Fun at all!”
  • Get out of Shell folder, and back to your class folder.

  • Then check the status. What do you see?

Remember this status. We will come back to here after reset.

  1. Add all changes to the staging area.

  2. Commit with message “Test second commit”

  • This is now your second version.
  1. Check the status.

  2. Check the commit logs with git log or git log --oneline.

  1. Now let’s get back to the previous version (Q1).
  • Run git reset HEAD~1
  • or git reset (hash)
  1. Check the status. Can you tell where you are?
  1. Let’s do the faulty the second commit once again.
  • First add the change to the staging area git add .
  • Commit with message “Yet another test commit”
  • Check status and log to see where you are.
    • Make sure you are on the second commit.
  1. Let’s hard reset to status when we first committed.
  • We will remove all the changes made after the first commit.
  • Run git reset --hard HEAD~1.
  • Check the status and log.

Set Remote repository

The process so far is only for your local version management, which is also completely fine for local work.

However, you can also make your repository “Synchronized” to a cloud server: Github.

git remote add adds remote repository.

  • origin is naming convention for Github remote repo.
  • main is your local branch name.
git remote add <remote_name> <remote_url>

# example: Remote Repo named as origin
git remote add origin https://github.com/username/repositoryname

Push to repository

git push sends your commit to the remote repository.

# When pushing for the first time
git push -u origin main 

# From then 
git push

Pull from repository

git pull does two operations altogether:

  • fetch: download changes, but not apply the changes to local
  • If you want to see the differences between the two.
  • merge: apply changes to your local file
git pull

# if job needes to be separated
git fetch
git merge

Exercise

2) My first push to GitHub

  1. Create your remote repository on Github website:
  • Create public repo

  • Name your repository nicely

  • DO NOT ADD README FILE (leave it unchecked)

  • Copy the address somewhere

    • The address will be like https://github.com/YourGithubId/YourRepoName.git

Git clone

What if you wanted to download someone else’s remote repository to your local machine? Git clone is for the first time downloading from the remote repository.

git clone <repo_url>

# example
git clone https://github.com/example-user/example-repo.git
  1. Name your local, initial branch as “main”
git branch -M main

Check your branches from current directory with

git branch

  1. Set remote branch (your github) address as “origin”
git remote add origin https://github.com/yourGithubId/Reponame.git

Browse your branches now again, and check the differences

git branch
git branch -r
git branch -a

  1. Push your local commit to upstream (remote repo)
git push -u origin main

Browse your github site, and see files are uploaded.

https://github.com/yourGithubId/Reponame.git

  • Share the address to me.

Note

By default, git will only track files, not empty folders. Folders are tracked when files are in it.

2) My Second Commit and Push to Github

Now, make changes to your R\>my_first_R_code.R file:

  • Write and modify the file: print("Hello world!") in the code and save.

  • Then add, commit, and push to the github.

    • commit message: “My second commit”
  • Browse history with:

git log

3) Reset changes (Time machine)

Now, suppose your second commit (version) was an error.

You can come back to previous commit by:

git reset <hash>

or

git reset head~1 # reset to 1 step before

At now, ONLY your local computer is back to the time when before the second commit. Your Github remote is in the future status (ahead) yet!

If you want your remote to be back to the first commit as well:

git push --force

Note that you should use --force option here.

4) Clone Class folder

  1. Navigate to the Class_repo subfolder
cd /path/to/FIN4470/Class_repo
pwd # check your location
  1. Clone the class folder to your machine
git clone <repository_url>
  • I will be uploading class materials here from now on.

For future updates, use

# Make sure you are on the right folder
git pull

Github and IDE

When using github from other apps (e.g. Positron, VScode, etc), you’ll use PAT (personal access token) instead of password.

  1. Install package from R and generate token
install.packages(usethis) # package installation
usethis::create_github_token()
  1. Generate token from the page, with no expiration (or long enough)
  • Copy the token to clipboard (you won’t see the token again)
  • Store it somewhere temporarily
  • Use it as your password from your Positron

More Git concepts

Understanding Git File States

Tracked vs Untracked

  • Tracked files: Files that are under Git version control (i.e., previously committed).
  • Untracked files: Files that have been created but have not yet been added to Git.

Staged vs Unstaged

  • Staged files: Files that have been marked (using git add) and are ready to be included in the next commit.
  • Unstaged files: Tracked files that have been modified since the last commit but have not been staged again.

Workflow Example

  1. You create a new file example.txt → Untracked

  2. Running git add example.txt stages the file → Staged

  3. After modifying example.txt, it becomes unstaged until you run git add example.txt again

Three pull method

By default, git pull is

git fetch # download update but do not apply changes to local
git merge # merge changes

git pull has three methods:

  • git pull --merge (default)

  • git pull --rebase

  • git pull --ff-only

git merge

Combines changes into a new merge commit. You will need to combine and resolve conflicts.

  • B, C are your local change

Example: Git merge scenario

Content of file.txt

Alice edits the second line and commits and push to main first:

Bob also edits the same line, but did not pull Alice’s changes. He edits and commits locally:

When Bob tries to push changes with git push, Git will reject because his local branch is behind the remote.

Bob runs git pull which is git fetch and git merge: git shows the conflicts

The file.txt now contains conflict markers

To resolve conflict, Bob should modify the file manually.

Bob stages this file and commits the resolution:

After resolving, Bob pushes with git push.

git rebase

Git replays your local branch on top of the remote branch, creating a linear history:

Example: Git rebase scenario

Content of file.txt

Alice edits the second line and commits and push to main first:

Bob also edits the same line, but did not pull Alice’s changes. He edits and commits locally:

When Bob tries to push changes with git push, Git will reject because his local branch is behind the remote.

Bob runs git pull --rebase which is git fetch and git rebase:

Instead of creating a merge commit,

  • git rewinds Bob’s local commit C

  • applies Alice’s changes from remote (origin/main) B

  • reapplies C on top of B

When there’s no conflict, (i.e., they did not change the same line of code) git pull –-rebase won’t raise any conflict message.

However, in our scenario, there’s conflict since both modified the same line, so:

  • rebase is halted

  • should be continued after conflict resolution

Similar to merge, conflict should be resolved manually.

The file.txt now contains conflict markers

To resolve conflict, Bob should modify the file manually.

Bob stages this file and continues the rebase (no commit)

git add file.txt
git rebase --continue

After that Bob pushes with git push. The history is linear.

git merge --ff-only

Used when you expect no conflicts.

When you want linear history and want to move your local branch pointer forward without modifying.

  • Git rejects if there is any conflict

.gitignore

Git keep tracks all changes within the project directory

  • New file, folder

  • Modifications (changes or updates)

  • Deletion

If there are files/folder you don’t want to track, specify them in .gitignore file.

git clean

If you want to remove untracked (not unstaged!) files:

git clean -nd # dry run (show what will be removed)
git clean -df # remove untracked files